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Similarity is universally acknowledged to be central in transfer, but recent 
research suggests that its role is complex. The present research attempts to isolate 
and compare the determinants of similarity-based access to memory and the 
determinants of the subjective soundness and similarity of a match. We predicted, 
based on structure-mapping theory, that subjective soundness would depend on 
the degree of shared relational structure, particularly higher-order structure such 
as causal bindings. In contrast, we predicted that memory retrieval would be 
highly sensitive to surface similarities such as common object attributes. To as- 
sess retrievability, in three studies, subjects were asked to read a large set of 
stories and were later given a set of probe stories that resembled the original 
stories in systematically different ways; e.g., purely relational analogies, surface- 
similarity matches, or overall (literal similarity) matches. Subjects were told to 
write out any of the original stories that came to mind. To assess subjective 
soundness, independent subjects (and also the same reminding subjects) were 
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asked to rate the inferential soundness of each pair; i.e., how well inferences true 
of one story would apply to the other. As predicted, subjective soundness was 
highly related to the degree of common relational structure, while retrievability 
was chiefly related to the degree of surface similarity. Ratings of the similarity of 
the pairs did not predict the retrievability ordering, arguing against the possibility 
that the retrieval ordering simply reflected overall similarity. Further, a fourth 
study demonstrated that subjects given a forced-choice recognition task could 
discriminate between possible matches on the basis of relational structure, ruling 
out the possibility that the poor relational retrieval resulted from forgetting or 
failing to encode the relational structure. We conclude that there is a dissociation 
between the similarity that governs access to long-term memory and that which is 
used in evaluating and reasoning from a present match. We describe a model, 
called MAC/FAC (‘‘Many are called but few are chosen’’), that uses a two-stage 
similarity retrieval process to model these findings. Finally, we speculate on the 
implications of this view for learning and transfer. © 1993 Academic Press, Inc. 


Analogy and similarity are central in learning and transfer. People solve 
problems better if they have solved prior similar problems (e.g., Ander- 
son, Farrell, & Sauers, 1984; Holyoak & Koh, 1987; Novick, 1988; Pirolli, 
1985; Reed, 1987; Ross, 1987, 1989) and these benefits extend to children 
as well as to adults (Brown & Kane, 1988; Chen & Daehler, 1989; 
Gholsen, Eymard, Long, Morgan & Leeming, 1988). One of the most 
enduring findings in the field is that similarity promotes reminding and 
transfer (Ellis, 1965; Osgood, 1949; Thorndike, 1903). 

However, although the relationship between similarity and transfer is 
strong, it is not simple. Recent research has confirmed the venerable 
finding that transfer increases with similarity (e.g., Anderson, Farrell, & 
Sauers, 1984; Holyoak & Koh, 1987; Novick, 1988; Pirolli, 1985; Reed, 
1987; Ross, 1987, 1989; Simon & Hayes, 1976) but has also brought home 
the complexity of similarity’s role in transfer. Three findings have 
emerged. First, accuracy of transfer depends Critically on the degree of 
structural match—e.g., match in causal structure (Schumacher & Gent- 
ner, 1988a,b; Holyoak & Koh, 1987) or in the principle applied (Novick, 
1988; Ross, 1984, 1987). Second, people often fail to access structurally 
appropriate materials, even when such materials are present in long-term 
memory. In Gick and Holyoak’s (1980, 1983) insightful series of studies, 
subjects were given Duncker’s (1945) radiation problem: how can one 
cure an inoperable tumor when enough radiation to kill the tumor would 
also kill the surrounding flesh? The solution, which is to converge on the 
tumor with several weak beams of radiation, is normally discovered by 
only about 10% of the subjects. But if given a prior analogous story in 
which soldiers converged on a fort, three times as many subjects (about 
30%) produced the correct answer. This indicates that spontaneous ana- 
logical transfer can occur to good effect. But it also reveals a limitation, 
for the great majority of the subjects failed to retrieve the analogous story. 
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Yet, when given a general hint that the story might be relevant, about 75% 
solved the problem, indicating that the failure was not of storage but of 
access.! 

Third, similarity-based remindings are often based on superficial com- 
monalities as well as, or instead of, structural commonalities (Holyoak & 
Koh, 1987; Novick, 1988; Reed, Ernst, & Banerji, 1974). For example, 
Ross (1984) taught subjects six probability principles in the context of 
different story lines (e.g., a drunk and his keys). Subjects were then tested 
on problems whose story line was similar to that of the appropriate prin- 
ciple, similar to that of an inappropriate principle, or unrelated to any of 
the study problems. Compared to the unrelated baseline, surface similar- 
ity in story lines improved performance when linked to an appropriate 
principle and hurt performance when linked to an inappropriate principle. 
In later research, Ross (1989) found that similarities in story lines had a 
large influence on the probability of accessing the prior problem, but little 
influence on the probability of correctly applying the principle. 

Such findings lead inescapably to the conclusion that a finer-grained 
analysis of similarity is necessary to capture its role in trarisfer. Our 
research is directed at three simple questions: (1) what kinds of similarity 
do people think constitute a good match?; (2) what kinds of similarity 
promote access to long-term memory? and (3) how do the answers to (1) 
relate to the answers to (2)? We first lay out the theoretical distinctions 
necessary to proceed. Then we present four experiments? and relate their 
findings to a simulation of similarity-based access and inference. 

To achieve a sufficiently precise vocabulary, we must decompose sim- 
ilarity into different classes and we must decompose transfer into multiple 


“subprocesses. We begin with similarity. Our decomposition of similarity 


uses the distinctions of structure-mapping theory (Gentner, 1980, 1983, 
1989a). First, analogy is defined as similarity in relational structure; more 
specifically, analogy is a one-to-one mapping from one domain represen- 
tation (the base) into another (the target) that conveys that a system of 
relations that holds among the base objects also holds among the target 
objects independently of any similarities among the objects to which 
those relations apply.* An important assumption is that a further selection 


‘It has been pointed out (e.g., Holyoak & Thagard, 1989a; Keane, 1985; Wolstencroft, 
1989) that this analogy is structurally imperfect, since the general’s goal is to protect his 
troops from their surround (the populace) whereas the surgeon’s is to protect the surround 
(the flesh) from the rays. However, Keane’s replication, which addressed this issue, pro- 
duced much the same results. 

? Experiments 1, 2, and 3 have been presented at conferences (Gentner & Landers, 1985: 
Rattermann & Gentner, 1987) but have never been fully published. One purpose of the 
present paper is to present these findings completely. 

3 This represents an ideal competence model of analogy—a description at Marr’s (1982) 
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TABLE 1 
Examples of Different Similarity Types 


Literal Similarity Milk is like water 

Analogy Heat is like water 

Mere Appearance The glass tabletop gleamed like water 
(Surface Similarity) 


is made: among the common relations that could participate in the anal- 
ogy, people prefer to focus on interconnected systems of relations (the 
Systematicity principle). That is, they prefer to map sets of relations that 
include common higher-order relations that constrain the common lower- 
order relations. Given these distinctions, we can distinguish further 
classes of similarity according to the kinds of predicates that are shared. 
In analogy, only relational predicates—low-order and higher-order—are 
shared. In literal similarity (overall similarity), both relational predicates 
and object-attributes are shared. In surface matches (or mere-appearance 
matches), only object-attributes and low-order relations are shared. Table 
1 gives examples of these different kinds of similarity. Although these 
distinctions are continua, not dichotomies, it is nonetheless useful to lay 
out the dimensions. 

The canonical situation for similarity-based transfer is that a person has 
some current topic in working memory and is reminded of some similar 
situation stored in long-term memory. There is general consensus that 
similarity-based transfer involves (1) accessing a prior potential analogy, 
(2) matching the prior (base) analog with the target, (3) mapping further 
inferences from the base to the target, (4) (possibly) adapting the infer- 
ences to fit the target, (5) evaluating the soundness of the analogy, and (6) 
(possibly) extracting the common structure for later use (J. Clement, 
1986; Gentner, 1988a; Gentner & Landers, 1985; Hall, 1989; Holyoak & 
Thagard, 1989b; Keane, 1985; Kedar-Cabelli, 1988; Novick, 1988; 
Thagard, 1988; Winston, 1982). On this account, similarity-based transfer 
involves from four to six subprocesses: accessing, matching, evaluating, 
drawing inferences, and often adaptation of the analogy and abstraction 
from the analogy. 

This brings us to the core claims of this paper. We suggest that different 
subprocesses utilize different kinds of similarity. Specifically, structure- 
mapping theory predicts that evaluations of the soundness of a match 
should be based on the degree of overlap in relational structure. In con- 
trast, we predict that similarity-based access to long-term memory will be 
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“computational level’’ or Palmer and Kimchi’s (£985) ‘‘informational-constraints level.” 
For a discussion of performance factors in analogical processing, see Gentner (1989a) or 
Schumacher and Gentner (1988a,b). 
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much more influenced by surface similarity than by overlap in relational 
structure. There will thus be a dissociation between the similarity types 
used at different subprocesses of transfer. After presenting the empirical 
evidence, we describe a simulation called MAC/FAC (‘Many are called 
but few are chosen”’) that uses a two-stage similarity retrieval process to 
model these theoretical claims. 

This prediction contrasts with two prominent positions on memory 
retrieval. First, the case-based-reasoning view states that items are in- 
dexed in memory via shared abstractions such as common goals and 
causal structures (e.g., Hammond, 1989; Keane, 1988; Schank, 1982). On 
this view, common higher-order structure should be the best retrieval 
probe. The second contrasting position, implicit in many psychological 
models of memory retrieval, is that there is a unitary notion of similarity 
on which transfer depends (e.g., Gillund & Shiffrin, 1984; Hintzman, 
1984). In this case, the best predictor of retrievability should be overall 
similarity. 

What do we know so far concerning the specificity of similarity types 
with respect to transfer subprocesses? Recent research’ in problem- 
solving and transfer has gone well beyond the stage of simply studying 
overall similarity in transfer and has made progress in tracing the details 
of the transfer processes. For example, Ross (1989), ina study of transfer 
in problem-solving, measured not only the proportion of correct solu- 
tions, but also the proportion of remindings, as measured by whether 
subjects wrote out a prior formula, regardless of whether they solved the 
problem correctly. This allows a contrast between solution rate—a mea- 
sure which presumably includes mapping, adaptation, evaluation, and 
drawing inferences—and reminding rate. Ross found that reminding rate 
was relatively strongly affected by surface similarity and solution rate by 
structural similarity. Novick (1988) gave novice and expert mathemati- 
cians study problems that included both surface-similarity distractors and 
remote analogs, followed by a later target problem. Although both experts 
and novices often initially retrieved surface-similar distractors, experts 
were quicker to reject initially incorrect retrievals, suggesting stronger 
effects of domain knowledge in mapping and evaluation than in retrieval. 
Keane (1988) also found stronger effects of surface similarity on retrieval 
than on mapping and use. He carried out a series of studies manipulating 
aspects of the Gick and Holyoak (1980, 1983) convergence problem dis- 
cussed earlier and demonstrated that a literally similar prior story (i.e., a 
story about a surgeon) was retrieved much more often than the remote 
(general and army) analog, but that subjects were equally good at mapping 
from the prior story to the target problem once both were present. Ho- 
lyoak and Koh (1987) also manipulated the convergence problem: they 
gave subjects one of four prior versions of the convergence problem, 
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varying the structural and surface similarity to the target problem (the 
radiation problem). They found that spontaneous access was influenced 
about equally by surface similarity and structural similarity, but that sub- 
jects’ ability to successfully perform the mapping was influenced only by 
structural similarity. 

The evidence thus suggests that surface similarity is more important at 
access than at later stages. In fact, the difference may be even greater 
than these results suggest. In all these Studies, subjects’ remindings were 
elicited in the course of a problem-solving task, and so may have been 
filtered by the subjects’ assessment of their usefulness. In this case the 
measured reminding rate reflects a combination of access, mapping and 
evaluation. Another complication is that, as Keane (1987) points out, 
findings from the problem-solving literature may underestimate people’s 
abilities to retrieve and process analogies because subjects may lack the 
domain knowledge required to recognize a correct analogy. 

In analogical mapping, there is considerable evidence that mapping 
accuracy is influenced by the consistency and systematicity of an anal- 
ogy, whether in problem-solving (Bassock & Holyoak, in press; Keane, 
1987; Novick, 1988; Reed, 1987; Ross, 1984, 1987), in story transfer (Gent- 
ner & Toupin, 1986) or in transfer of mental models from one device to 
another (Collins & Gentner, 1987; Gentner & Gentner, 1983; Schumacher 
& Gentner, 1988). Clement and Gentner (1991) found a focus on common 
systems in processing novel explanatory analogies. Asked to choose 
which facts were most relevant to the analogies, the subjects attended not 
just to whether the facts matched, nor to how important the facts were, 
but to whether the matching facts were connected to other maiching 
facts. The same preference for common interconnected systems held for 
making predictions from an analogy. Given an analogy that allowed two 
possible predictions—i.e., that had two facts present in the base but not 
the target—subjects predicted the fact that was connected to a common 
relational system. 

But if common relational structure is important in successful transfer, 
and if, as we will argue, retrieval processes often produce poor matches 
as well as good ones, it is important to know whether people can tell good 
analogies from poor ones. That is, do people weight structural consis- 
tency and systematicity strongly when evaluating the soundness of an 
analogy? This is a key prediction of structure-mapping theory (Gentner, 
1983, 1989). There is not much direct evidence on this point. Gentner and 
Clement (1988) found that common relational structure is important in 
Judging the aptness of metaphors.‘ People rate comparisons based on 


* Although the literature on metaphor is strangely disconnected from that on analogy, 
there is a close connection between the phenomena. Many of the metaphors and similes 
used in current research could as well be called analogies. In particular, relational meta- 
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shared relations (e.g.,‘‘A camera is like a tape recorder.’’) as more apt 
than those based on shared attributes (e.g., ‘‘The sun is like an orange.”’) 
and their aptness ratings are correlated with the degree to which their 
interpretations include relational (but not attributional) information (Gent- 
ner, 1988b; Gentner & Clement, 1988; Gentner, Falkenhainer, & Skor- 
stad, 1988). But these findings address metaphoric aptness, not analogical 
Soundness, and further, although they demonstrate a preference for 
shared relations, they do not address the more specific claim of a prefer- 
ence for common systems of relations governed by higher-order relations 
(the systematicity claim). Moreover, a contrary result was found by Reed 
(1987). He found that subjects weighted surface similarities strongly when 
asked to rate the potential usefulness of one mathematical problem for 
solving another. In some cases they even rated surface-similar but disan- 
alogous problems as more useful than analogous, surface-dissimilar prob- 
lems, suggesting that structural similarity was not their criterion for a 
useful analogy. However, usefulness may not be equivalent to soundness; 
subjects may have felt that a surface-similar problem would be more 
useful to them in a task because they lacked the domain knowledge nec- 
essary to apply a non-transparent match. Further, returning to Keane’s 
(1987) point, the subjects may not have had the requisite domain knowl- 
edge to recognize a structurally sound analogy. Novick’s (1988) studies 
show that expert mathematicians are considerably better than novices at 
discarding inappropriate retrievals, suggesting that with sufficient knowl- 
edge people may be able to judge the soundness of a mapping. 

In the present research, we attempted to design clear tests of the kinds 
of similarity involved in similarity-based access and in soundness evalu- 
ation. We also strove to decompose levels of commonality precisely and 
to do so for relatively large stimulus sets. Much of the existing literature 
is based on very small sample sizes, often from one to four stories. Aside 
from the problems of generalizing over small samples, to study the phe- 
nomena of memory retrieval properly would seem to require moderately 
large sample sizes. 

The basic method was the same across all three studies. We first cre- 
ated sets of story pairs differing systematically in their similarity class, as 
described below. To investigate the access process, we used a story- 
memory task. Subjects read a set of stories and later wrote out any re- 
mindings they experienced when reading a second set of similar stories. 
In contrast to the problem-solving context, here there is no additional 
criterion for the memory task beyond being reminded. This minimizes the 


Phors/similes, such as ‘‘Sermons are like sleeping pills.’’ (Ortony, 1979), can readily be 
analyzed as analogies (Gentner, 1982; Gentner & Clement, 1988; Gentner, Falkenhainer & 
Skorstad, 1987). 
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contribution of other subprocesses, such as mapping and evaluation, and 
comes close to being a pure measure of access. To investigate the eval- 
uation process, we presented subjects with the pairs of stories used in the 
memory task and asked them to rate the soundness (and in later studies, 
the similarity) of the pairs. This should require only mapping and evalu- 
ation. 

The story pairs were designed as combinations of different levels of 
predicate matches. Matches could occur at three levels of predicates: (1) 
object attributes, such as countries/nations; (2) first-order relations, such 
as events and other relations between objects (e.g., X Shooting at Y/ 
P firing at Q); and (3) higher-order relational structure, such as causal 
relations or other kinds of plot structure. By the systematicity assump- 
tion, the matches should be considered more sound when they share 
higher-order relational structure than when they do not. Note that it is not 
enough for the stories simply to contain the same higher-order relations: 
they must also have like bindings to lower-order relations: e.g., (1) is 
analogous to (2) but not to (3) (where A, B, C and X, y, Z are events). 


(1) A causes B, B causes C. 
(2) x causes y, y causes z. 
(3) X causes y, z causes x. 


These three levels of matches—objects, first-order relations, and 
higher-order relational structure—were combined to yield different kinds 
of story matches. In the first experiment,> the story matches were surface 
matches, which matched in object descriptions and first-order relations; 
analogies, which matched in first-order and higher-order relations; and 
FOR matches, which matched only in first-order relations. Subjects were 
told that if the new story reminded them of any of the original stories they 
should write out that original story in as much detail as possible. After the 
reminding task, subjects rated the soundness of each pair of original and 
cue stories. Soundness was explained as the degree to which inferences 
from one story would hold for the other. 

According to structure-mapping, the subjective soundness of an anal- 
ogy should depend on the degree and depth of the common relational 
structure and not on the amount of overlap in object-attributes. There- 
fore, analogical matches should be rated as more sound than either sur- 
face matches or FOR matches. In contrast, the surface superiority claim 
for access predicts that access should depend much more heavily on the 
degree of object matches and event matches than on the degree of match 
in relational structure. Thus the access and soundness tasks should show 


5 The first study was carried out by Russell Landers in 1984 as an M.I.T. undergraduate 
honors project and appears in the Proceedings of the IEEE (Gentner and Landers, 1985), 
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different orderings of similarity types. In contrast, if there is one unitary 
kind of similarity on which both these processes depend, soundness and 
access should show the same ordering of similarity types. 


EXPERIMENT 1 


Method 


Subjects 


The subjects were 30 students from the MIT Psychology Department. 


Design and Materials 


Similarity type was varied within subjects, with each subject receiving ¥5 analogies, % 
surface matches, and % FOR matches. For counterbalancing, subjects were divided into 
three groups that differed in which story pairs they received in each Similarity type. For this 
purpose, the 18 stories were divided into three sets of six. This gave a3 x 3 Group (between) 
X Similarity type (within) design. 

The dependent measures for the reminding task were three measures of subjects’ recall of 
the original stories: judges’ ratings of their recalls, proportion of recalls rated above crite- 
rion, and proportion of recalls of a predefined keyword. The dependent measure for the 
soundness task was subjects’ ratings of the soundness of the matches. 

Materials. The materials were 18 sets of stories, with four stories per set, plus 14 filler 
stories. The stories were two or three paragraphs long. Each of the 18 story sets contained 
an original story plus three matching cue stories, which differed in the amount and level of 
similarity they shared with the original (See Table 2 and Appendix for sample sets of stories 
from Experiments 1 and 2). All cues shared identical or nearly identical first-order relations 
(e.g., events and actions) with the original story. They differed in the other levels of shared 
similarity. 

In analogy (AN) cues, common higher-order relational structure was added to the first- 
order relation matches. The objects (i.e., the characters, physical objects, and locations) 
differed. 

In surface-similarity match (SS) cues, object matches were added to the first-order rela- 
tion matches. The higher-order relational structure (i.c., the causal structure or plot struc- 
ture) differed. 

In FOR match (FOR) cues, only the first-order relations matched. The objects as well as 
the higher-order structure differed. >. 

Each subject received only one matching cue story for each of the 18 original stories. All 
subjects received the same memory set; the difference was in the type of story used in the 
cue set. This ensured that the measure of reminding would not be contaminated by differ- 
ential availability of the memory material. To ensure comparability, the cue stories in a 
given set were made as similar as possible, subject to the constraints of the theory. In 
particular, the AN and FOR cues were constructed to have the same objects, and the SS and 

FOR cues were constructed to have the same higher-order structure. The objects were 
people, animals, companies, and countries. To avoid purely lexical remindings, the use of 
identical words between original and cue was avoided; similarities were expressed by means 
of synonyms or closely similar words. (The exceptions were function words and certain 
connectives and prepositions that had no acceptable substitutes.) For each story, one to 
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TABLE 2 
Sample Stimuli from Experiments 1 and 2 


Base Story 


Karla, an old hawk, lived at the top of a tall oak tree. One afternoon, she saw a hunter 
on the ground with a bow and some crude arrows that had no feathers. The hunter took aim 
and shot at the hawk but missed. Karla knew the hunter wanted her feathers so she glided 
down to the hunter and offered to give him a few. The hunter was so grateful that he 
pledged never to shoot at a hawk again. He went off and shot deer instead. 


Literal-similarity match 
Once there was an eagle named Zerdia who nested on a rocky cliff. One day she saw a 
sportsman coming with a crossbow and some bolts that had no feathers. The sportsman 
attacked but the bolts missed. Zerdia realized that the sportsman wanted her tailfeathers so 
she flew down and donated a few of her tailfeathers to the sportsman. The sportsman was 
pleased. He promised never to attack eagles again. 


Analogy match 


Once there was a small country called Zerdia that learned to make the world’s smartest 
computer. One day Zerdia was attacked by its warlike neighbor, Gagrach. But the missiles 
were badly aimed and the attack failed. The Zerdian government realized that Gagrach 
wanted Zerdian computers so it offered to sell some of its computers to the country. The 
government of Gagrach was very pleased. It promised never to attack Zerdia again. 


Surface-similarity match 


Once there was an eagle named Zerdia who donated a few of her tailfeathers to a 
sportsman so he would promise never to attack eagles. One day Zerdia was nesting high on 
a rocky cliff when she saw the sportsman coming with a crowsbow. Zerdia flew down to 
meet the man, but he attacked and felled her with a single bolt. As she fluttered to the 
ground Zerdia realized that the bolt had her own tailfeathers on it. 


FOR Match 


Once there was a small country called Zerdia that learned to make the world’s smartest 
computer. Zerdia sold one of its supercomputers to its neighbor, Gagrach, so Gagrach 
would promise never to attack Zerdia. But one day Zerdia was overwhelmed by a surprise 
attach from Gagrach. As it capitulated the crippled government of Zerdia realized that the 
attacker’s missiles had been guided by Zerdian supercomputers. 


three of the characters were given proper names. These always differed between original 
and cue but were similar or identical for all three cue stories.® 

Finally, following a technique used by Read (1983, 1984), the final sentence in each 
original story was not paralleled in any of the cue stories. This sentence served as a pure 
memory test; it could not be reconstructed by backwards mapping from the cue stories. It 


© In some story sets, the sexes of the AN and FOR characters were changed from those 
of the original to heighten the lower-order differences. (The SS characters were, of course, 
always the same sex as the original.) In these cases, the names of the cue characters were 
kept as similar as possible: e.g., Mr. Boyce and Ms. Boyce, Christian and Christine, and 
Sidney and Cindy. 
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was designed so that it could not be predicted from the rest of the original, did not alter the 
plot structure, and contained at least one distinct new word or concept (the keyword), 
whose presence could be easily detected in the recall. For example, in the ‘“‘Karla the hawk”’ 
set shown in Table 2, the final original sentence is “‘He went off and shot deer instead.” 
with deer the unique word. 


Procedure 


Reminding task. Subjects were tested in groups of three to eight in two separate sessions. 
In the first session, subjects read a booklet containing the 18 original stories intermixed with 
14 filler stories. (The first three and last three stories were always fillers.) All subjects read 
the same 32 stories, in different semi-random orders. They were told to read the stories 
carefully, so that they would be able to remember them a week later. Subjects took about 
30 min for this task. 

The second session took place 6 to 8 days later. Subjects received a workbook containing 
18 cue stories that corresponded to the 18 original stories read in the previous session. Each 
workbook consisted of six SS cues, six AN cues, and six FOR cues. Subjects were told that 
for each cue story they should write down any original story of which they were reminded. 
If they were reminded of more than one story, they were to write the one that best matched 
the current story. They were told to include as many details as they could remember—if 
possible, the names of characters and their motives as well as the events. 

Soundness-rating task. Following completion of the reminding task in the second session, 
the subjects were given a soundness-rating task. In this task they were given the same pairs 
of stories they had received in the reminding task (regardless of whether the subject had 
succeeded in the recall task). Thus each subject rated six SM, six AN, and six FOR matches 
in the same counterbalancing groups as in the reminding task. They were asked to rate each 
pair for the inferential soundness of the match. In explaining the task, we avoided using 
terms like ‘‘analogy’’ or ‘‘analogous.’’ Instead, soundness was explained as follows: ‘This 
part of the experiment is about what makes a good match between two stories or situations. 
We all have intuitions about these things. Some kinds of resemblances seem important, 


while others seem weak or irrelevant. . . .”” We then described a sound match as one in 
which “‘two situations match well enough to make a strong argument,’’ and in which ‘‘the 
essential aspects of the stories match . . .” so that “*you can draw conclusions about the 


second story from the first.”’ Subjects were told that the opposite of a sound match is a 
Spurious match, in which ‘‘the resemblance between the stories is superficial.’’ They were 
told that if they could ‘infer or predict much of the second story from the first . . . then this 
is a sound match. If you could not, then this is a weak, or spurious match.” 


Scoring 


The subjects’ written recalls were scored by two judges. The judges were provided with 
a list of the 18 original stories, and for each of the subjects’ recalls, the judges were told 
which original story was the intended match. However, they were not told which cue story 
the subject had seen; that is, they were blind to the Similarity condition. Three kinds of 
reminding scores were obtained: 

(1) Overall score. The judges rated how close the subject’s recalled version was to the 
actual original story, using a 0-5 scale, as follows: 


5 = All important elements of the original and many details. 

4 = All important elements of the original and some details. 

3 = All important elements of the original but very few or no details. 
2 = Some important elements of the original; others missing or wrong. 
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1 = Some elements from the original but not enough to be certain that the subject 
genuinely recalled the original. 
0 = No recall or different story. 


(2) Proportion remindings. (i.e., number above criterion). The 0-5 recall score reflects the 
quality of the recalls. This could lead to confoundings if some kinds of similarity matches 
permit more accurate recalls (through reconstruction). Therefore, as a second dependent 
measure, we dichotomized the judges’ scores. All responses that were judged to be clearly 
identifiable recalls of the original (i.e., recalls that received an overall score of 2 or better) 
were Classified as remindings; all descriptions with scores of one or zero were classified as 
non-remindings. This allowed a measure of the number of remindings each subject produced 
for each of the three similarity types. 

(3) Keyword score. As discussed above, each original story contained a final key sentence 
that was not paralleled in the cue stories. To facilitate scoring, these sentences each con- 
tained a distinctive word or concept. Subjects’ descriptions were scored as 2 if they con- 
tained the keyword or a close synonym, | if they contained a dubious synonym and 0 
otherwise. 

The independent scores assigned by the two judges agreed 75.5% of the time and were 
within one point of each other 97.5% of the time. Disagreements were resolved by discus- 
sion, 


Results and Discussion 
Soundness 


As predicted, subjects judged the pairs that shared higher-order rela- 
tional structure (the analogy matches) to be significantly more sound than 
the pairs that did not (the FOR matches and surface matches). Figure la 
shows the mean soundness ratings for the three types of similarity 
matches. 

A one-way analysis of variance confirmed a main effect of Similarity 
type, F(2,58) = 70.51, p < .0001. Planned comparisons using the Bon- 
ferroni adjustment (a = .05)’ showed a significant advantage for AN 
matches (M = 4.40) over SS (M = 2.80) and FOR matches (M = 2.74) 
and no significant difference between SS and FOR matches. A one-way 
ANOVA using items as the random variable also revealed a significant 
effect of Similarity types, F(2,34) = 46.31, p < .0001. 


Reminding 4 


The results of the reminding task show a different pattern. As predicted 
by the surface superiority hypothesis, the SS matches were the most 
effective reminding cues. On all three measures of reminding, the AN 
matches were much less effective than the SS matches. Figure 1b shows 
the proportion of stories recalled (i.e., the proportion of recalls assigned 
a score of 2 or better) for each type of cue. A one-way analysis of variance 


7 All planned comparisons in this paper were performed using the Bonferroni adjustment 
ata = .05. 
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Fic. 1. (a) Mean soundness ratings for the three similarity types in Experiment 1. (b) 
Proportion recalled for the three similarity types in Experiment 1. 


confirmed a main effect of Similarity type, F(2,58) = 75.45, p < -0001. 
Planned comparisons showed a significant advantage in proportion of 
remindings for SS matches (M = .78) over AN matches (M = 44) and for 
AN matches over FOR matches (M = .25). A one-way analysis of Sim- 
ilarity type oyer items again confirmed the main effect of Similarity type, 
F(2,34) = 31.80, p < .0001. 

The other two measures of reminding showed a similar pattern to the 
proportion recalled. Table 3 shows the mean ratings of quality of recall 
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TABLE 3 
Results of Experiment 1: Mean Rating of Soundness Compared to Three Measures of 
Reminding across the Three Similarity Types 


Match type 
Analogy SS matches FOR matches 
(RI + R2)7 (OA + RI) (R1) measure 
Soundness” 4.40 2.80 2.74 
Proportion recalled 44 -78 25 
Proportion of keywords recalled 21 26 13 
Quality of reminding 1.48 2.36 -86 


* Here OA refers to object commonalities, RI to first-order relational commonalities, and 
R2 to higher-order relational commonalities. 
5 Soundness and quality of reminding are mean ratings on 1 (low) to 5 (high) scales. 


and the mean keyword scores across the three match types. For compar- 
ison, the proportion recalled and the mean soundness ratings are also 
given. 

Similar analyses of the judges’ ratings of quality of recall revealed the 
same pattern as for proportion recalled. There is a main effect of Simi- 
larity type, F(2,58) = 72.81, p < .0001. The mean rating for SS (M = 
2.36) exceeded that for AN (M = 1.48), which exceeded that for FOR (M 
= .86). Again, the items analysis also showed a significant effect of Sim- 
ilarity type, F(2,34) = 24.79, p < .0001. The keyword analyses revealed 
weaker effects in the same direction. There was a marginal main effect of 
Similarity type in the subjects analysis, F(2,58) = 2.80, p < .06, but not 
in the items analysis. 

Overall, the reminding results present a uniform picture. Not surpris- 
ingly, there is an effect of sheer degree of match. Starting with a first- 
order match, accessibility was increased either by adding common object- 
attributes (since SS was superior to FOR) or by adding higher-order re- 
lational structure (since AN was superior to FOR). However, the 
advantage was greater for adding object descriptions. Surface matches 
produced the greatest proportion of remindings, followed by AN and then 
by FOR matches, supporting surface superiority in retrieval. 

This pattern differs markedly from the pattern found for subjective 
soundness. Indeed, the two measures were uncorrelated 7(52) = .15, NS. 
For subjective soundness, as predicted by structure-mapping theory, 
common higher-order relational structure is a crucial determinant of the 
subjective ‘‘goodness”’ of an analogy. Subjects rated AN matches (R2 + 
R1) as more sound than FOR matches (RI), but SS matches (R1 + OA) 
were rated as no better than FOR matches. These results make it unlikely 
that soundness is determined by the mere number of features shared. 
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Rather, they suggest that common higher-order relational structure is 
particularly valued. 

The dissociation between access and inferential soundness is striking. 
Analogical matches were rated as sound but were not well retrieved, and 
the opposite was true for surface matches. It appears that subjects’ mem- 
ory access mechanisms did not provide them with the matches that they 
themselves considered most valuable. 


EXPERIMENT 2 


Experiment 2 was conducted to replicate and extend Experiment 1. 
First, we wished to investigate the possibility that the retrievability or- 
dering found in Experiment 1 was simply a function of overall similarity, 
as suggested by Holyoak and Koh (1987). It could have been the case that 
the surface-similar cues were simply more similar to their originals than 
were the analogy cues, and the analogies more similar than the FOR 
matches. In this case, there would be no reason to invoke a special role 
for surface attributes in similarity-based access and it might be possible to 
preserve a unitary similarity view of retrieval. (Soundness evaluations 
would be dealt with separately in such a theory of transfer.) To address 
this concern, we added a similarity-rating task, again performed by inde- 
pendent subjects. 

A second concern was the soundness-rating task. We wanted these 
ratings to reflect subjects’ assessment of the inferential utility of the 
match. However, we became concerned that the instructions might have 
biased subjects against surface matches.® Although we described sound 
matches as ones from which predictions could be made, we also described 
spurious matches as ‘‘superficial’’ and cautioned the subjects against us- 
ing shallow similarities as a basis for soundness judgments. Subjects 
could well have interpreted these instructions as warning against using 
object-level commonalities, in which case their seeming preference for 
relational matches is uninformative. To be sure we were gauging subjects’ 
intuitions about inferential usefulness, we modified the instructions for 
the soundness-rating task, removing any caution against using surface or 
superficial features and stressing instead that subjects should judge 
whether they could make inferences or predictions from one story to the 
other. 

A third concern was to avoid contamination from the reminding task to 
the ratings tasks, since the ratings task must of necessity follow the re- 
minding task. We therefore conducted each of the ratings tasks with 
independent subjects (as well as with the recall subjects). 


® We thank Keith Holyoak and an anonymous reviewer for pointing out this potential 
problem. 
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In addition to these concerns, an important addition was made to the 
design. In Experiment 1, we began with FOR matches (R1) and added 
either object-attribute commonalities to create surface matches (OA + 
R1) or higher-order relational commonalities to create analogy matches 
(Ri + R2), with the former leading to an access advantage and the latter 
leading to a soundness advantage. In Experiment 2, we completed this 
design, adding a literal similarity condition in which all three kinds of 
commonalities were present—OA, R1 and R2. With the addition of literal 
similarity (LS) matches, the set of match types forms a 2 x 2 design, as 
shown in Fig. 2. 

This design enables a further test of the predictions for soundness. 
According to structure-mapping, the subjective soundness of a similarity 
match depends only on the degree to which relational structure is shared. 
Therefore, literal similarity matches and analogies should be rated as 
highly and about equally sound, since both kinds of matches include 
substantial and roughly equivalent common relational structure. In par- 
ticular, LS soundness ratings should be no higher than AN soundness, 
despite greater numbers of shared predicates. 

For access, the surface superiority hypothesis predicts that both LS 
matches and SS matches will be better recalled than AN matches and 
FOR matches. Thus the advantage of LS over AN should be greater than 
the advantage of LS over SS. In contrast, the goal-oriented memory view 
predicts that LS and AN will be better recalled then SS and FOR 
matches. Finally, the unitary similarity account predicts that the acces- 
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Fic. 2. Design of materials for Experiment 2. All pairs shared first-order relations (e.g., 
events). 


SPT TEPER Serna shen eS Dr MENU rsp Pepe 
r 
er nt RSS TERS 


540 GENTNER, RATTERMANN, AND FORBUS 


sibility pattern will match the pattern of subjective similarity. The design 
allows two parallel tests of the relative contributions, since LS-AN and 
SS—FOR both reflect the additional contribution of object commonalities, 
while LS-SS and AN-FOR reflect the additional contribution of higher- 
order relational commonalities. 


Method 


Subjects 


The subjects for the reminding task were 36 undergraduates from the University of Ilinois 
who received class credit for participation. The subjects for the soundness and similarity 
rating tasks were 20 volunteer subjects from Hampshire College, who performed the sound- 
hess task with new instructions, and 40 undergraduates at the University of Illinois, 20 of 
whom received psychology class credit for participation and 20 of whom were paid for their 
participation. The paid and unpaid subjects from University of Illinois were evenly distrib- 
uted across the similarity rating task and the soundness rating task with old instructions. 


Materials and Design 


The materials were the same as those used in Experiment 1 with two additions. First, two 
new story sets were added for a total of 20. Second, the story sets were expanded to include, 
along with the original story, literal similarity (LS) cues as well as AN, FOR, and SS cues. 
This addition required a few minor changes in the prior stories to preserve the parallel 
design. The LS cues had in common with the original stories all three levels: object attrib- 
utes (OA), first-order relations (R1), and higher-order relations (R2); SS had two levels (OA 
and Ri) as did AN (R1! and R2); and FOR had one level of commonality (R1). (See Table 2 
and Appendix for examples of LS cues.) “! 

As in Experiment 1, Similarity type was varied within subjects. Subjects were divided into 
four counterbalancing groups which each received % of the stories in each of the four 
similarity conditions. Each story set was rated equally often in each similarity condition and 
no subject received more than one pair from any story set. The presence of matching 
higher-order structure and object attributes was varied systematically among the four match 
types, making a 2 x 2 design: Relational similarity (within) x Object similarity (within). 


Procedure 


Soundness. The soundness-rating task was run both with the original instructions and with 
new intructions that avoided biasing against surface similarities, as follows: ; 

“This part of the experiment is about what makes a good match between two stories or 
situations. We all have intuitions about these things. Some kinds of resemblances seem 
important, while others seem weak or irrelevant . . . In this part of the experiment, we want 
you to use your intuitions about soundness—that is, about when two situations match well 
enough to make a strong argument... . , 

A sound match between two stories is one in which the essential aspects of the stories 
match. To put it another way, a sound match is strong enough that you can infer or predict 
things about the second story from the first. For example, suppose your read the first story 
and just the first part of the second story. Could you infer or predict, with fair accuracy, 
what happens in the rest of the second story? If you could predict it with reasonable 
accuracy, then this is a sound match. If you could not, then this is a weak, or spurious 
match.” 

Similarity. The method for the similarity-rating task was similar to that used for the 
soundness ratings. Subjects saw pairs of stories and rated them on a 1-5 point rating scale 
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with 5 = extremely similar and { = extremely dissimilar. In the instructions, ‘‘similarity”’ 
was explained as ‘‘overall, how the characters and actions in the two stories resemble one 
another; or how much there is a general resemblance between the two stories.”” The vague- 
ness of the description was intentional: we wanted this rating to reflect the subjects’ own 
opinion of what makes two items similar. 

Reminding. The procedure for the reminding task was as in Experiment 1. In the first 
session, subjects were told to read and remember 20 original stories and 12 filler stories. One 
week later, they returned for the reminding session and were given booklets of 20 cue 
stories, each of which matched one and only one original story. The instructions for the 
reminding task were identical to those of Experiment 1.° 

The data were scored as in Experiment 1, with two blind judges rating each recall on a 0-$ 
scale for accuracy to the original story. The judges agreed 79% of the time and were within 
1 rating point of each other 97% of the time. The proportion of remindings above criterion 
and the score for keyword recall were computed as in Experiment 1. 


Results and Discussion 
Soundness 


As predicted, subjects based their soundness judgments primarily on 
the degree to which the two stories shared a relational system. (See Fig. 
3a.) Planned comparisons revealed an advantage for LS matches (M = 
4.41) and AN matches (M = 4. 16) over SS matches (M = 2.70) and FOR 
matches (M = 2.58). As predicted, there was no significant difference 
between LS and AN matches nor between SS and FOR matches. A 2 x 
2’ Relational similarity (within) x Object similarity (within) analysis of 
variance confirmed a main effect of Relational similarity, F(1,19) = 
185.29, p < .0001. Neither the main effect of Object similarity nor the 
interaction of Relational similarity and Object similarity was significant 
F(1,19) = 1.99, p < .17 and F(1,19) = .638, p < .6). The item analysis 
also revealed a main effect of Relational similarity, F(1,19) = 86.19, p< 
0001. 

We also ran the soundness-rating task on 20 other subjects using the 
instructions from Experiment 1, which emphasized avoiding superficial 
commonalities. The same pattern of results was obtained as above. Sub- 
jects rated LS matches (M = 4.26) and AN matches (M = 4.01) as more 
sound than SS matches (M = 3.10) and FOR matches (M = 2.87) (Bon- 
ferroni comparisons). The analysis of variance produced the same pattern 
of significance. 


Similarity 


As predicted, subjects’ similarity ratings were sensitive to both rela- 
tional commonalities and object-attribute commonalities. (See Fig. 3b.) 


° These subjects then performed the soundness-rating and similarity-rating tasks. The 
patterns of results were substantially the same as those of the independent subjects and are 
omitted for brevity. 
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Fic. 3. (a) Mean soundness rating for the four similarity types in Experiment 2. (b) Mean 
similarity rating for the four similarity types in Experiment 2. 


Bonferroni comparisons revealed an advantage of LS matches (M = 4.50) 
over AN matches (M = 4.09), of AN matches over SS matches (M = 
3.40), and of SS matches over FOR matches (M = 2.88). The analysis of 
variance revealed main effects of Relational similarity, F(1,19) = 85.78, p 
< .0001, and Object similarity, F(1,19) = 7.01, p < .02. The interaction 
between Relational similarity and Object similarity was non-significant, 
F(1,19) = .24. The item analysis also revealed main effects of Relational 
similarity, F(1,19) = 65.25, p < .0001, and Object similarity, F(i,19) = 
5.98, p < .05. 

The comparison between similarity and soundness is revealing. Simi- 
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larity, like soundness, is sensitive to the presence of common higher- 
order relational structure. In fact, across the 80 story matches, similarity 
is highly correlated with soundness, r(78) = 69, p < .001. This is con- 
sistent with accounts of similarity that emphasize alignment of structure, 
rather than simple feature-matching (Falkenhainer, Forbus, & Gentner, 
1989; Gentner, 1983, 1989; Goldstone & Medin, in press; Markman & 
Gentner, 1990, in press; Medin, Goldstone & Gentner, in press). How- 
ever, whereas similarity is sensitive to both object commonalities and 
relational commonalities, soundness appears more specifically focused on 
relational commonalities. 


Reminding 


The pattern for reminding was very different from that for soundness 
and similarity. Object commonalities contributed strongly to memory ac- 
cess and higher-order relational commonalities had little effect. Subjects 
recalled more LS (M = .56) and SS (M = .53) matches than AN (M = 
-12) and FOR (M = .09) matches (Bonferroni comparisons). The differ- 
ence between LS and SS and that between AN and FOR were nonsignif- 
icant. Figure 4 shows the proportion of remindings (i.e., the proportion of 
recalls receiving scores above criterion) for the four kinds of similarity 
matches. 

A 2 X 2 Relational similarity (within) x Object similarity (within) anal- 
ysis of variance performed on the number of remindings revealed a main 
effect of, Object similarity, F(1,35) = 108.73, Pp < .0001. There was no 
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Fic. 4. Proportion recalled for the four similarity types in Experiment 2. 
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effect of Relational similarity, F(1,35) = .89. Also, as throughout the 
reminding results, there was no interaction between Object similarity and 
Relational similarity. An item analysis also revealed a main effect of 
Object similarity, F(1,19) = 78.45, p < .0001. 

Similar analyses were performed on the overall quality-of-recall scores 
and the keyword data. Table 4 shows all three measures of reminding— 
proportions of remindings, mean ratings of quality of recall, and mean 
keyword scores across the three similarity match types. For comparison, 
the mean ratings of soundness and similarity are also included. 

The quality-of-reminding results showed the same ordering of means: 
there was a significant advantage for LS matches (M = 1.92) and SS 
matches (M = 1.64) over AN matches (M = .44) and FOR matches (M= 
.27). The analysis of the quality-of-reminding data revealed a main effect 
of Object similarity, F(1,35) = 124.18, p < .0001, and also a main effect 
of Relational similarity, F(1,35) = 4.98, p < .05, presumably reflecting 
‘the advantage of relational matches in construction during recall. The 
item analysis also revealed a main effect of Object similarity, F(i,19) = 
77.82, p < .0001, and a main effect of Relational similarity, F(f,19) = 
5.29, p < .01. The keyword results showed the same ordering of means 
(See Table 4), although there were no significant pairwise comparisons. 
The keyword analysis revealed main effects of Object similarity, F(1,35) 

= 11.86, p < .01, but not Relational similarity, F(1,35) = .5. An item 
analysis also confirmed a main effect of Object similarity, F(1,19) = 7.45, 
p < .05, but not of Relational similarity, F(1,19) = .65. 


TABLE 4 
Results of Experiment 2: Mean Ratings of Soundness and Similarity Compared to Three 
Measures of Reminding across the Four Similarity Types 


Match type 

FOR 

Literal SS matches 
similarity Analogy matches (Ri)? 

(OA + Ri + R2) (R1 + R2) (RO + RI) measure 
Soundness®* 4.41 4.16 2.70 2.58 
Similarity 4.50 4.09 3.40 2.88 
Proportion recalled 56 12 53 09 

Proportion of keywords 

recalled 1.11 17 81 25 
Quality of reminding 1.92 44 1.64 27 


* Here OA refers to object commonalities, R1 to first-order relation commonalities and R2 
to higher-order relational commonalities. 

» Soundness and similarity were rated by two independent groups of subjects. 

© Soundness, similarity and quality of reminding are mean ratings on | (low) to 5 (high) 
scales. 
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The results of the reminding task replicate and extend the pattern of 
object-dominance in remindings found in Experiment 1. Matches that 
shared object-level commonalities—LS and SS matches—produced many 
more remindings than AN and FOR matches. In contrast, higher-order 
relational similarity did not lead to a significant recall advantage in any of 
the three measures except quality of reminding, and this measure seems 
likely to overestimate the importance of relational similarity in memory. 
(This is because, as discussed above, any tendency subjects may have to 
use the cue story as a template for their written recall will tend to lead to 
improved quality-of-reminding scores for LS and AN (the two relationally 
similar match types) but not for SS and FOR.) Moreover, even under the 
quality of reminding measure, AN matches were not significantly better 
than FOR matches im access. The other two measures of recall, keyword 
recall and propertion recalled, show effects of object similarity but not of 
higher-order refational similarity. 

Comparisons across tasks. As in Experiment 1, the results of the re- 
minding task contrast sharply with those of the soundness-rating task. In 
two different versions of the soundness task, subjects consistently relied 
on relational commonalities. Thus, as predicted by structure-mapping, 
common systematic relational structure appears paramount in subjects’ 
judgments of inferential soundness. 

This attention to relational structure was not reflected in subjects’ re- 
call performance. There was no correlation between the soundness rat- 
ings and proportion reminding, 1{78) = -008, NS. Instead, similarity- 
based access to memory appears strongly influenced by surface similar- 
ity, with common relational structure playing a smaller role. In 
Experiment 2, common higher-order relational structure had no signifi- 
cant effect on the proportion of remindings; in Experiment 1 there was an 
effect, in that analogies were better recalled than FOR matches, but the 
gain was small compared to the gain for SS matches. 

We now turn to whether memorial access reflects overall similarity. In 
this case, the reminding results should parallel the similarity ratings. The 
results do not bear out this possibility. There was no correlation between 
similarity and proportion recalled, (78) = .17, NS. Further, as can be 
seen in Figs. 3b and 4, the results of the reminding task show a very 
different pattern from those of the similarity-rating task. 

But although the pattern for similarity does not resemble that for recall, 
it does rather resemble that for soundness, and as noted above, similarity 
is highly correlated with soundness. Perhaps this degree of attention to 
structure in similarity judgments is a little surprising. An advocate of the 
““memory-reflects-similarity’’ view might even suspect that our subjects 
were not reporting their true feelings of similarity. But the finding that 
subjects are sensitive to common relational structure in judging similarity 
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is consistent with several other recent findings (e.g., Markman & Gent- 
ner, 1990, in press; Goldstone, Gentner, & Medin, 1991). We return to 
this point in the discussion. 

Subjects considered relational matches more similar and more sound 
than object matches yet recalled more object matches than relational 
matches. These results support the surface superiority account of re- 
trieval over the clever-indexing view. Further, these results suggest that 
different kinds of similarity govern the process of access to long-term 
memory and the processes of mapping and evaluating similarity compar- 
isons when both members are present, arguing against a unitary model of 
Similarity in transfer. 


EXPERIMENT 3 


In the first two studies, surface matches were highly accessible in mem- 
ory. Adding object commonalities to stories that shared first-order rela- 
tions was much more effective in promoting recall than adding higher- 
order relations. This seems to support the claim of surface superiofity in 
access. But there is another interpretation, namely, that the SS stories 
actually invited subjects to build appropriate higher-order relations, 
which then contributed to the reminding (suggested by Hammond, per- 
sonal communication). The SS stories shared both objects and events 
with the base stories. If we suppose that particular combinations of ob- 
jects and events are associated with typical higher-order causal patterns, 
then subjects receiving the SS matches might have invoked higher-order 
relational commonalities as well. For example, presented with an SS 
match that includes one country invading another, subjects might think of 
typical causal interactions among warring countries, and these higher- 
order patterns might contribute to their being reminded of the original 
story. In this case the SS matches would effectively have been (OA + R1 
+ R2) matches, possessing both higher-order commonalities and surface 
commonalities. Their retrievability advantage over AN (R1 + R2) could 
then be due to their effectively possessing more matching levels. Such an 
explanation would preserve the possibility that common higher-order 
structure is the crucial element in retrievability. 

In Experiment 3, we tested this possibility. We created object-only 
matches—pairs of stories that had common object-descriptions but vir- 
tually no relational commonalities either in first-order events or in higher- 
order causal structure, as shown in Table 5. If in the first two experiments 
the seeming object-superiority in memory access was an artifact of 
higher-order relations being invoked, then object matches by themselves 
should be ineffective at promoting recall. In particular, objects-only 
matches (OA) should lead to considerably less retrieval than analogies 


ROLES OF SIMILARITY IN TRANSFER 547 


TABLE 5 
Sample Materials from Experiment 2, Showing a Base Story and its Objects-Only Match 


Base story 

Two small countries, Bolon and Salam, were adjacent to a large, warlike country called 
Mayonia. Bolon decided to make the best of the situation by taking over Salam. Salam 
started looking for aid from other strong countries but soon Bolon succeeded in taking it 
over. Then victorious Bolon proposed to make a treaty with its warlike neighbor Mayonia. 
Bolon proposed to give Mayonia control over Salam in exchange for a guarantee that Bolon 
would remain independent. Mayonia responded by overrunning both Bolon and Salam. 
Bolon was so busy maintaining control of Salam, it could do nothing to stop Mayonia. 
Thereupon Mayonia installed Puppet governments in both Bolon and Salam and took over 
the newspapers and ratio stations. 


Objects-only match 


Two weak nations, Lincoln and Moreland, bordered each other. Both countries relied 
heavily on the tourist trade to keep their economies afloat. They competed with each other 
over which one of them would get the most tourists. Meanwhile, another nearby nation, 
Chad, had a very strong economy with a thriving tourist trade. Tourist cruises flocked into 
its harbors and planes full of visitors were constantly landing in its airport. Because of this, 
Moreland tried to join forces with Chad in its new advertising campaign to entice still more 
tourists. Unfortunately a hurricane hit the coast and bankrupted all three nations. 


(R1 + R2). Apart from this Prediction, it is of some interest to know to 
what degree common object-descriptions by themselves can promote ac- 
cess. 

This design also allows a further test of the predictions for soundness 
and similarity. According to structure-mapping, soundness should be 
based on the degree of common relational structure: highest for AN 
matches (RI + R2), then SS matches (OA + RI), and finally OO (objects- 
only) matches (OA). Similarity should increase with the number of com- 
monalities, so that OO matches should be rated lower in similarity than 
SS and AN matches. If we further find that AN is rated as more similar 
than SS, then this will strengthen the conclusion from Experiment 2 that 
structural commonalities are important in subjective similarity. 

The same cued-recall method was used as in the prior studies, with one 
change. The keyword measure, which yielded consistent but weak re- 
sults, was dropped in order to simplify the scoring procedure. 


Method 
Subjects 


The subjects were 72 undergraduates from the University of Illinois, who were fulfilling 
a course requirement. Thirty-two subjects participated in the reminding task, 20 in the 
similarity-rating task and 20 in the soundness-rating task. 
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Materials and Design 

The materials were similar to those of the previous study, except that only AN matches 
and the two kinds of surface matches were used. Each subject received seven analogy 
matches (AN) and seven superficial matches. For half the subjects, the superficial matches 
were OO matches, which shared only object descriptions, and not first-order events nor 
causal structure. Table 5 shows sample materials. The other half received SS matches of the 
same kind used in the prior experiments, which shared first-order relations as well as 
objects. In addition, each subject received six LS matches (the same six across all subjects). 
This was done to give subjects some literally similar remindings to anchor their responses. 
Within each group, subjects were further divided into two counterbalancing groups that 
differed as to which stories were presented in each similarity type. Subjects were divided 
into two groups, an AN-SS group and an AN—-OO group. 


Procedure 


Reminding. The procedure was the same as that used in Experiments | and 2, except that 
keyword scoring was dropped. In the first session, the subjects read the 20 original stories 
and 12 filler stories. One week later, they were given 20 matching stories in the reminding 
task. Two judges rated the subjects’ recalls. The inter-judge reliability was 75.8%. 

Soundness and similarity. The procedure was as in Experiment 2, including the improved 


soundness-rating instructions. 


Results and Discussion 


Soundness 


Table 6 shows the results for soundness, similarity, and reminding. As 
predicted, the soundness ratings reflected degree of relational overlap. 
The AN matches (M = 3.85) were rated as significantly more sound than 
both the SS matches (M = 2.38) and the OO matches (M = 1.76) (within- 
group (9) = 3.03, p < .01 and ¢(9) = 9.73, p < .0001, respectively). The 


TABLE 6 
Results of Experiment 3: Mean Ratings of Soundness and Similarity Compared to Three 
Measures of Reminding across the Three Similarity Types 


Match type 
Literal® similarity Analogy SS matches OO matches 
(OA + R1 + R2) (R1+R2) (OA + RI (OA)? 
Soundness? 4.16 3.85 2.38 1.76 
Similarity 4.59 4.01 3.21 2.30 
Proportion recalled 62 07 52 17 
Quality of reminding 1.79 . 22 1.39 47 


* OA refers to object attribute commonalities, R1 to first-order relational commonalities, 


and R2 to higher-order relational commonalities. 
+ Soundness, similarity, and quality of reminding are mean ratings on 1 (low) to 5 (high) 


scales. 
© The literal similarity results are included for comparison, although these matches served 


merely as anchoring stimuli. 
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ments. One way to save the clever-indexing position would be to argue 
that the apparent surface superiority in retrieval was merely an artifact of 
the fact that the first-order relational commonalities in the SS matches 
gave rise to further higher-order structural commonalities. This possibil- 
ity is ruled out by the fact that not just the SS matches but also the OO 
matches are superior to the AN matches. The fact that OO matches (with 
object-attributes [OA] only) were recalled better (17% retrieval) than the 
purely relational AN matches (R1 + R2, at 7% retrieval) indicates that 
object-level commonalities, even by themselves, can compete with rela- 
tional structure matches in accessibility. 

On the other hand, the results of Experiment 3 also tend to rule out the 
possibility that the high recall of SS matches was due simply to the pres- 
ence of object commonalities. Had this been the case, the OO and SS 
matches would have been equally well recalled in Experiment 3. In fact, 
SS matches (OA + RI at 52%) were recalled significantly more often than 
OO matches. 

As predicted, and as in the prior studies, the soundness ordering re- 
flects the degree of relational overlap: AN, then SS, then OO. As before, 
this ordering contrasts sharply with the ordering of recallability: for ex- 
ample, analogy is rated as most sound but is least well recalled. Also as 
in Experiment 2, the similarity ordering is much like the soundness or- 
dering, supporting the claim that structural commonalities are important 
in similarity, and quite unlike the accessibility ordering, undermining the 
notion of a unitary similarity governing the various stages of transfer. 


EXPERIMENT 4 


We have interpreted these results as indicating that similarity-based 
retrieval is more sensitive to object similarity than to relational similarity. 
But an alternate possibility is that the relational disadvantage arises from 
encoding and storage rather than from retrieval. Perhaps subjects simply 
failed to process the original stories with sufficient depth to be able to tell 
the difference between surface and deep matches, or perhaps they forgot 
the higher-order structure. This possibility was raised by Hammond, 
Seifert, and Gray (1991), who carried out a study using the same materials 
as in our Experiment 2, but varying whether subjects received intact or 
scrambled stories as retrieval probes. Not surprisingly, they found that 
retrieval was worse with scrambled than intact probes. However, the 
overall pattern was the same in both conditions as it was in our study: 
stories with surface matches (LS and SS) were retrieved better than sto- 
ries with low surface similarity (AN and FOR). Hammond, Seifert and 
Gray pointed out that the similarity in retrieval patterns between scram- 
bled and intact probes is consistent with the possibility that the initial 
encodings were superficial. Although they concede that this pattern is 
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also consistent with our claims that the encodings were adequate and the 
retrieval was surface-oriented, their point is important and must be dealt 
with. 

To determine the locus of the surface superiority effect, we need to 
know whether subjects possessed the requisite relational knowledge at 
test time. Therefore we carried out a further study. We gave subjects the 
same study task as in Experiment 2—20 base stories plus 12 fillers—and 
tested them one week later with a forced-choice recognition task in which 
they received the LS match (OA + RI + R2) and the SS match (OA + 
R1). Since these two matches are equated for degree of surface match, if 
subjects prefer the LS match, we can infer that their encodings of the base 
stories included sufficient R2 information to permit the distinction. 

A second group of subjects was given the same task, except that their 
recognition task consisted of AN matches (RI + R2) versus FOR 
matches (R1). The reasoning here is parallel to the previous case, but with 
an added refinement. If subjects choose the AN match over the FOR 
match, this will indicate that their stored representations possessed suf- 
ficient higher-order relational information to make the distinction, and 
that the AN and/or FOR match is a good enough retrieval cue to provide 
access to the base representation. 


Method 
Subjects 


The subjects were 27 undergraduates from Northwestern University, 14 in Group 1 and 13 
in Group 2, who received course credit for their participation. 


Materials and Design 


The materials were as in Experiment 2: 20 base stories plus 12 fillers, with the associated 
LS and SS matches used at test. All subjects received the same encoding task. Each of the 
two groups was further divided into four counterbalancing groups to control left-right po- 
sition of the recognition test items and order of presentation. 


Procedure 


In the first session, subjects read the 32 initial stories with the same instructions as in the 
previous studies. One week later, they returned and were given a forced-choice recognition 
task. Group 1 received the 20 pairs of LS and SS matches for the 20 base stories. They were 
told to choose the one that best matched the story they had read. The procedure was the 
same for Group 2, except that the AN and FOR matches were given in the recognition test. 


Results and Discussion 


Group | subjects chose the LS matches 82% of the time (M = 16.46 out 
of 20), x? = 61 (df = 12), p < .0001. From this we can infer that the 
subjects’ stored representations contained higher-order relational infor- 
mation. This suggests that the locus of the surface bias is at retrieval, and 
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this conclusion is further buttressed by the results for Group a oe 
subjects were at chance: they chose the AN response only 55% = 
time (M = 11 out of 20). They presumably possessed the ee ane 
relational representation as the Group 1 subjects, since they a sana 
precisely the same treatment. But whereas Group | bei wi re 
surface-similar probes were able to successfully ACCESS their — repr 

sentations, Group 2 subjects, with their relationally similar probes, were 
ae these results we infer first, that subjects encoded and ae = 
relational structure of the base stories. Second, they were a . = 
able to retrieve these descriptions when they later encounteres - ; 
similar stories than when they later encountered relationally ae ar s : 
ries. We conclude that the locus of the surface superiority effect is a 


retrieval. 
COMPUTATIONAL SIMULATION 


The results of these four experiments present a complex erates ot 
similarity in transfer. These results, along with the ee Se ‘ a 
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the model must be able to store structured representations eee : . 
representation criterion); (2) it must capture processes of struc i 
ment and mapping over these representations (the structura ee a 
criterion). At the same time, the retrieval process must be suc a 
occasional relational remindings occur (the rare insights serait ‘ a8 
the majority of its retrievals are LS matches (the primacy of : e ids ae 
criterion); and (5) retrievals based on surface similarity are ae ce 
surface superiority criterion). Finally, criterion (6), the sca ‘ ility 2 
rion, is that the system must be capable of being extended to large m 
eCinenl models can be divided into those that capture kee first ey 
criteria and those that capture the last three. Models of simi mie _ 
assume smart processes operating over richly aan a ae al ts 
can capture the first two criteria. Most case-based reasoning oe 2 fs . 
this character (Hammond, 1989; Kolodner, 1984; canes ). dear 
models are sufficiently rich to capture processes like case-a ei a 
even more clever processes, such as pragmatic matching and : ‘a ‘: 
of cases (Kass, Leake, & Owens, 1987; Reisbeck & Schank, ; - s : 
ever, these models typically assume use of intelligent beni t al Se oe 
significant higher-order abstractions, so that people phe ee me 
cess the best structural match. Such models typically fai : pre oo 
surface superiority effect in retrieval. It is also not clear how abs 

indi e with very large memories. 

Othe 2 of aan and disadvantages holds for approaches 
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& Pike, 1989 for a review) and in many connectionist models of learning 
(e.g., Smolensky, 1988). These models, with their nonstructured knowl- 
edge representations and their relatively simple match processes, do not 
allow for the structural precision of people’s similarity judgments and 
inferences. However, the use of feature-vectors has some advantages for 
modeling access to long-term memory. The computations are simple 
enough to make it feasible to compute many such matches and choose the 
best, thus satisfying criterion (6), scalability. It should also be Straight- 
forward for feature-vector representations to satisfy the surface superi- 
ority criterion (5), and (provided surface and structural features are cor- 
related), the primacy of the mundane criterion (4). 

We seek to combine the advantages of both these approaches by using 
a two-stage model of retrieval and mapping. The first Stage carries out a 
cheap but error-prone search and the second stage carries out a full struc- 
tural matching and mapping process. We begin with the second Stage, 
SME, a simulation of mapping and evaluating comparisons. Then we 
summarize the large simulation, MAC/FAC, a simulation of similarity- 
based retrieval and mapping which uses SME as its second stage. We 
compare their results with the data presented above. 


. SME and Subjective Soundness 


The Structure-mapping engine (SME) is a simulation of the process of 
interpreting and evaluating comparisons. Given a pair of descriptions it 
Carries out a structural alignment, projects candidate inferences from base 


& Gentner, 1986, 1989; Forbus & Oblinger, 1990; Gentner, 1989b; Skor- 
Stad, Falkenhainer, & Gentner, 1987). SME uses a local-to-global match- 
ing process to arrive at the maximal structurally consistent interpreta- 


_ tion(s) of an analogy or similarity comparison. Given propositional rep- 


resentations of two potential analogs, SME begins by finding all possible 
local matches between predicates in the base and predicates in the target. 
For each predicate in the base, it finds all possible matches in the target. 
Two items can match if either (a) they are intrinsically alike! or (b) they 


'° SME’s constraint of matching identical predicates assumes canonical conceptual rep- 
resentations, not lexical Strings. Two concepts that are similar but not identical (such as 
“‘bestow”’ and ‘‘bequeath”’ are assumed to be decomposed into a canonical representation 
language so that their similarity is expressed as a partial identity (here, roughly ‘‘give’’), 
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are Corresponding arguments of matching predicates and are objects or 
‘functions. 

These local matches are often mutually inconsistent. In the next stage, 
SME imposes structural consistency: it combines these local hypotheses 
into subsystems that are mutually consistent (e.g., that have one-to-one 
object bindings and parallel relational connectivity). It then tries to com- 
bine these subsystems into larger mutually consistent mappings. In this 
way it generates one or a few global interpretations of the comparison. !! 
SME also draws any further candidate inferences that would follow from 
the interpretation. Predicates that belong to the base system but were not 
initially present in the corresponding target system are hypothesized to be 
true in the target system. 

Finally, each interpretation is given a structural evaluation. To ensure 
a preference for systematic relational structure, the depth of the common 
structure is taken into account in determining the evaluation. SME uses a 
cascade-like process in which predicates pass some multiple of their 
match score down to their arguments (Falkenhainer, Forbus & Gentner, 
1986, 1989; Forbus & Gentner, 1989). This means that a deeply nested 
interpretation will be preferred over a flat interpretation containing the 
same number of matching predicates. 


Comparing SME’s Evaluations with Human Soundness Judgments 


According to theory, the structural evaluations assigned by SME in 
analogy mode should ordinally match the soundness ratings assigned by 
human subjects across the story pairs. In analogy mode, SME initially 
ignores object attributes and matches identical relations; all other predi- 
cates are put into correspondence based on their structural roles. SME’s 
literal similarity mode is similar except that object attributes are also 
matched initially. The theoretical commitment here is that soundness 
depends only on common relational structure, and hence should be cap- 
tured by analogy mode, while similarity depends on an overall match (so 
object similarity as well as structural alignment is important). To test the 
soundness claims, we constructed propositional representations for 9 of 
the 20 story sets used in Experiment 2. The 9 base-AN pairs and 9 base- 
SS pairs were given to SME in analogy mode. The results are shown in 
Table 7, along with the human soundness results from Experiment 2. 

Consistent with the theory, SME’s structural evaluation scores are 
higher throughout for the base-AN pair than for the base-SS pair. To 
further test the apparent parallel between SME’s pattern of preference 


"| SME can be run exhaustively to produce all possible interpretations of a given com- 
parison, but we think it more plausible that humans produce only one or two interpretations 
(Forbus & Oblinger, 1990). 
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TABLE 7 
Comparison of SME’s Structural Evaluation Scores with Human Soundness Ratings 


SME’s structural 
evaluation score 


Human subjects’ 
soundness ratings 


AN > SS 

AN SS AN > SS AN SS story # 
5: Karla, hawk 23.5 17.0 + 4.4 2.4 + 
7: Julius, mule 26.0 21.0 + 4.6 2.0 + 
8: Percy, squirrel 19.5 17.5 + 4.2 3.2 + 
9: Steak, dog 38.5 33.0 + 4.0 1.8 + 
10: Boris, business 39.0 34.5 + 3.4 1.4 + 
13: Morris, prisoner 45.5 28.5 + 5.0 1.4 + 
15: Fred, shepard 55.0 45.5 + 4.8 1.8 + 
17: Pioneers, divide 21.5 19.5 + 4.2 2.2 + 
19: Cobra, Pierre 51.0 47.5 + 4.6 4.2 + 


° These figures are for SME in analogy mode. 
» + Indicates that the score was greater for the analogy (AN) pair than for the surface- 
similarity (SS) pair. 


and that of human subjects, we computed for each story set the difference 
between SME’s base/AN score and its base/SS score and correlated this 
with the corresponding difference in the human soundness ratings (i.e., 
base/AN-base/SS). As predicted, we found a significant positive correla- 
tion, r(7) = .73, p < .05. 


Summary 


SME was designed to embody the structural consistency and system- 
aticity constraints. While the fit between SME’s evaluation order and that 
of humans is not proof of the psychological reality of these constraints, it 
shows that this account is consistent with human performance. SME’s 
process of beginning with local matches and coalescing these into global 
structural matches seems apt for modeling human processing. For exam- 
ple, Goldstone and Medin (in press) have found that local similarities have 
their effects on mapping earlier than global relational similarities in a 
timed mapping task. Ratcliff and McKoon (1989) found that in a sentence- 
matching task subjects could discriminate new from old sentences early in 
processing if the new sentences contained all new words (e.g.,‘‘Helen 
attracted Jeff.’’ vs. “‘Andrew accosted Mary.’’); but only after about 700 
msec. could subjects reject sentences based on differences in relational 
structure (e.g., ‘‘Helen attracted Jeff.’ vs. ‘Jeff attracted Helen.’’). In 
pilot experiments using perceptual stimuli, in which subjects were timed 
under different kinds of mapping instructions, Arthur Markman and I 
have found that subjects are faster to choose on the basis of similar 
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objects than on the basis of similar relations, even when the two rules 
dictate the same response. 

Aside from this timing evidence, SME’s local-to-global process of com- 
puting alignment has the attractive feature that it begins blindly. Since the 
emergence of clever or deep interpretations results from its ability to 
utilize interconnectivity between its initial local matches and its prefer- 
ence for systematicity and structural consistency, it does not need ad- 
vance knowledge of the content of the analogy as in some prior accounts 
(see Carbonell, 1986; Holyoak, 1985). The local-to-global algorithm is 
adaptable to a connectionist framework: similar processing principles 
have been incorporated into Holyoak and Thagard’s (1989a) ACME 
model of analogical mapping and Goldstone and Medin’s (in press) SIAM 
model of similarity processing. ACME has the attractive feature of com- 
bining structural, pragmatic and similarity-based constraints, though at 
the cost of failing to ensure structural consistency. SME’s use of struc- 
tural consistency permits spontaneous inferences from base to target. 
Hofstadter & Mitchell’s (in press) Copycat also develops an overall match 
from local pressures, using a parallel terraced scan to allocate attention 
among multiple local clusters (Hofstadter, 1984). 


Simulating Memory Retrieval: The MAC/FAC Model 


Of the six constraints on a model of similarity-based transfer, SME 
satisfies the first two: it accepts structural representations and produces 
structural mappings. Now we turn to the other four criteria—the rare 
insights criterion, the primacy of the mundane criterion, the surface su- 
periority criterion, and the scalability criterion. To simulate similarity- 
based retrieval we require a process that is strongly but not solely influ- 
enced by surface similarity and that is somewhat but not wholly insensi- 
tive to structural consistency. It should typically retrieve literally similar 
matches, often retrieve surface-similar matches, and occasionally retrieve 
purely analogous matches (Wharton, Holyoak, Downing, Lange and 
Wickens, 1991, 1992). Finally, it must be able to deal with the fact that 
although access to memory may be structurally insensitive, once the 
knowledge is retrieved its structure can be used. 

The model we propose, called MAC/FAC (for ‘‘Many are called but 
few are chosen’’), uses a two-stage retrieval process (Gentner, 1989b; 
Gentner & Forbus, 1991). For a full description of the model see Gentner 
& Forbus (in preparation); here we briefly summarize it. The first Stage 
(MAC) is a ‘‘wide-net’’ stage in which a crude, computationally cheap 
match process is used to pare down the vast set of memory items into a 
small set of candidates for more expensive processing (c.f. Bareiss & 
King, 1989). The second stage—the FAC stage—is a full structural sim- 
ilarity matcher, namely, SME in literal similarity mode, as discussed 
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above. The dissociation noted previously, we claim, can be understood in 
terms of the interactions of its two stages. MAC/FAC’s inputs are a pool 
of memory items and a probe: a description for which a match is to be 
found. Its output is a memory description and a comparison of this de- 
scription with the probe. We make minimal assumptions concerning the 
global structure of long-term memory: e.g., whether the pool is the whole 
of long-term memory or just a subset. 


The MAC stage 

The ‘‘Many are called’’ initial process is meant to be a cheap, fast 
process whose job is to select a manageable number of possible analogs to 
pass along to the “‘Few are chosen’’ stage. The MAC stage aims at a quick 
estimate of the overlap between the probe and each item in the memory 
pool. One way to do this would be to carry out the first part of a full 
analogy process for each possible pair, and then simply count the match 
hypotheses for each pair rather than completing the structural alignment. 
This technique was used in our original version of MAC/FAC (Gentner, 
1989b) as well as in ARCS (Thagard, Holyoak, Nelson, & Gochfeld, 
1990). The problem with this technique is that it is very costly. Even with 
parallel and/or neural hardware, it is unlikely that generating match hy- 
pothesis networks between a probe and everything in the pool of memory 
can be accomplished quickly enough to provide realistic response times 
for a large memory pool. 

The method we adopted is to associate with each structured represen- 
tation a content vector. Content vectors are flat summaries of the knowl- 
edge encoded in complex relational structures. The content vector for a 
given description specifies which predicates were used in that description 
and the number of times they occurred. (Here ‘“‘predicate”’ is used in the 
inclusive sense to include functions, connectives, object attributes, rela- 
tions, and so on.) Thus if there were four occurrences of IMPLIES in a 
story, the value for the IMPLIES component of its content vector would 
be four. (Other values are possible depending on the normalization used 
(see Gentner & Forbus, 1991, in preparation.)) Content vectors are as- 
sumed to arise automatically from structured representations and to re- 
main associated with them. 

On this account, initial access occurs at the level of content vectors and 
later stages of access and reasoning occur over structural descriptions. In 
the initial (MAC) stage, a dot product is taken between the content vector 
of the probe item and the content vector of each item in the memory pool. 
The output of MAC is the best match and everything within 10% of it. 
These pairs are passed to the FAC stage. 


The FAC Stage 
The FAC stage is responsible for structural alignment, interpretation 
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and evaluation of the best matches from the MAC stage. We use SME in 
literal similarity mode as the FAC matcher. We use literal similarity 
rather than analogy on the assumption that the normal criterion for suc- 
cessful retrieval is a good overall match. (Recall that literal similarity is 
structurally sensitive like analogy, but more inclusive.) FAC operates on 
the structural representations, not the content vectors. It carries out a 
structural alignment of each of the MAC output items with the probe, 
resulting in an interpretation, a set of new candidate inferences, and a 
structural evaluation. Thus, at the FAC Stage issues of parallel structural 
binding become important. FAC will reject many of the matches given to 
it by MAC. 

FAC thus acts as a structural filter on the output of MAC. This allows 
a divide-and-conquer strategy. The MAC stage is cheap and could plau- 
sibly be carried out in parallel for every item in the memory pool, perhaps 
in a connectionist architecture. To be sure, MAC’s estimate of similarity 
as the dot product of content vectors has critical limitations. Like feature- 
vector schemes, it does not take the actual relational structure into ac- 
count. It only produces a numerical score, and hence doesn’t produce the 
correspondences and candidate inferences which provide the power of 
analogical reasoning and learning. But these limitations, in combination 
with FAC’s structural filter, may be what is needed to model our highly 
efficient but somewhat fallible memory. 

How successful is this two-stage model? We briefly summarize some 
comparisons of MAC/FAC’s performance with the psychological data. 
(For the full set of studies, see Gentner & Forbus, 1991; in preparation.) 
Across the three experiments, the human subjects showed a stable re- 
trievability order: LS >-= SS > AN > = FOR (where ‘‘> =” means 
““greater than or equal to’’). For example, in Experiment 2 the propor- 
tions of remindings were .56 for LS, .53 for SS, .12 for AN and .09 for 
FOR matches. The question is whether MAC/FAC can duplicate the hu- 
man pattern: in particular, the propensity for retrieving SS and LS 
matches rather than AN and FOR matches. 


Comparing MACIFAC with Human Performance 


For these simulations, we encoded predicate calculus representations 
for 9 of the 20 story sets (45 stories).'? Then we gave MAC/FAC different 
memory sets and different probe types and recorded retrievals. To count 
as a retrieval, a story had to pass both MAC and FAC. 

For example, in one study, MAC/FAC’s memory set consisted of all 
four variants of each of the nine base stories (a memory set of 36 stories). 


” To have encoded all 20 story sets plus the 12 distractors would have required a pro- 
hibitive amount of encoder time. 
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Each base story in turn was used as a probe. This is a more difficult task 
than the subjects’, for whom there was one and only one memory match 
for each probe story; but in partial compensation, MAC/FAC had nothing 
corresponding to the subjects’ delay between study and test. In any case 
it is not absolute difficulty but order of difficulty that is at stake here. 

Table 8 shows the mean number of matches of different similarity types 
that pass both MAC and FAC. There are several points to note. First, the 
retrieval results (i.e., the number that make it through both stages) ordi- 
nally match the results for human subjects: LS > SS > AN > FOR. This 
degree of fit is reasonably encouraging. Second, as expected, MAC pro- 
duces some matches that are rejected by FAC. The mean number of 
memory items produced by MAC is 3.3 and the mean number accepted by 
FAC is 1.5. Third, as expected, FAC succeeds in its job as a structural 
filter on the MAC matches. It accepts all of the LS matches proposed by 
MAC and some of the partial matches (the SS, AN and FOR matches), 
and rejects most of the inappropriate matches (the ‘‘other’’ matches from 
different story sets). It might seem puzzling that FAC accepts more SS 
matches than AN matches, when it normally would prefer AN over SS. 
The reason is that it is not offered this choice; it must take the best of the 
matches passed on by MAC for a given probe. 

We have tried several other variants with similarly encouraging results. 
For example, in another study, MAC/FAC was given the 9 base stories in 
memory along with the 9 FOR stories, which served as distractors. We 
then used each of the variants—LS, SS, and AN—as probes. The pro- 
portion of times the base story made it through both MAC and FAC was 
1.0 for LS, .89 for SS, and .67 for AN. Again the results ordinally match 
those of human subjects. 

We can compare MAC/FAC’s performance with that of the closest 
comparison model, Thagard, Holyoak, Nelson, and Gochfeld’s (1990) 
ARCS model of similarity-based retrieval. Thagard et al. gave ARCS a 


TABLE 8 : 
Results of MAC FAC Simulation: Mean Numbers of Different Match Types Retrieved 
When Base Stories Are Used as Probes 


Retrievals MAC FAC 
LS 78 78 
Ss 67 44 
AN 33 Al 
FOR 22 0 
Other 1.33 22 


Notes. Memory contains 36 stories (LS, SS, AN, and FOR for 9 story sets); the probes 
were the 9 base stories. Other = any retrieval from a story set different from the one to 
which the base belongs. 
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memory set consisting of the Karla the hawk story in memory along with 
100 Aesop’s fables as distractors. When given the four similarity variants 
as probes, ARCS produced asymptotic activations as follows: LS (.67), 
FOR (—.11), SS (—.17), AN (—.27). These results differ from the LS = 
SS > AN > = FOR order found for human remindings. In particular, SS 
remindings, which should be about as likely as LS remindings, are quite 
infrequent. In contrast, MAC/FAC produces the same ordinal sequence, 
explaining the data better than ARCS. This is especially interesting be- 
cause Thagard et al. argue that a complex localist connectionist network 
which integrates semantic, structural, and pragmatic constraints is re- 
quired to model similarity-based reminding. While such highly integrated 
models are intriguing, MAC/FAC shows that a simpler model can provide 
a better account of the data. 

There are open questions with MAC/FAC. For example, at present it 
always produces at least one match for every probe. We are experiment- 
ing with the use of thresholds to capture cases where nothing is recalled. 
Also, MAC/FAC does not capture effects of multiple simultaneous inter- 
actions of probes with memory items (e.g., Hintzmann, 1984, 1986; Medin 
& Schaffer, 1978), and competition among retrieval items is less integral 
than in ARCS (Thagard, Holyoak, Nelson, & Gochfeld, 1990; Wharton, 
Holyoak, Downing, Lange, & Wickens, 1991, 1992). 

However, MAC/FAC’s overall pattern of behavior captures the moti- 
vating phenomena. It allows for structured representations and for pro- 
cesses of structural alignment and mapping over these representations 
(criteria 1 and 2). It produces a small number of analogical matches, thus 
satisfying criterion 3, the existence of rare insights criterion. The majority 
of its retrievals are LS matches, thus satisfying criterion 4, the primacy of 
the mundane. It also produces a fairly large number of SS matches, thus 
satisfying criterion 5, surface superiority. Finally, its algorithms are sim- 
ple enough to apply over large-scale memories, thus satisfying criterion 6, 
scalability. 


GENERAL DISCUSSION 


We began with the venerable claim ‘‘Similarity is central in transfer.”’ 
Our results suggest that the proper terms are ‘‘Similarities”’ and ‘‘transfer 
processes.’’ An account of similarity’s effects on transfer requires making 
fine distinctions both about similarity and about transfer. Specifically, we 
found a dissociation between the matches people get from memory and 
the matches they want. The accessibility of matches from memory was 
strongly influenced by surface commonalities and weakly influenced by 
structural commonalities. In contrast, rated inferential soundness of com- 
parisons was strongly influenced by structural commonalities and not at 
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all influenced by surface commonalities. Subjective similarity itself was 
influenced by both surface and structural commonalities. 


Soundness 


The results bear out the structure-mapping prediction that the subjec- 
tive soundness of a similarity match is determined by the degree to which 
the analogs share relational structure. Three aspects of the results lends 
support to this prediction. First, adding common relations increases the 
perceived soundness of a match. In every case where a precise compar- 
ison is possible, soundness increases with the addition of relational com- 
monalities. When higher-order relations were added in Experiments | and 
2, analogies (RI + R2) were judged more sound than first-order matches 
(RD); and similarly in Experiment 2, literal similarity matches (OA + RI 
+ R2) were judged more sound than surface matches (OA + R1). When 
first-order relations were added in Experiment 3, surface matches (OA + 
R1) were judged more sound than objects-only matches (OA). 

The second finding is that adding higher-order commonalities to a first- 
order relational match increases its soundness more than adding object 
commonalities. In all three experiments the soundness of analogical 
matches (RI + R2) is substantially higher than that of surface matches 
(OA + RI). By itself this result might simply mean that the number or 
salience of features was greater for higher-order relational features than 
for object features. But the finding of a surface advantage in memory 
access renders this explanation implausible, or at least forces us to as- 
sume a different salience weighting for retrieval than for subjective sound- 
ness; and this is our point. 

The third piece of evidence for structural specificity in soundness judg- 
ments is that the addition of object-attribute commonalities fails to in- 
crease soundness. The conditions for this test are met once in Experiment 
1 and twice in Experiment 2. In Experiment 1, surface matches (OA + 
R1) were considered no more sound than first-order event matches (R1) 
(M = 2.80 and M = 2.74, respectively). In Experiment 2, surface 
matches (OA + R1) (M = 2.70) were not significantly more sound than 
FOR matches (R1) (M = 2.58). Literal similarity matches (OA + RI + 
R2) (M = 4.41) were not significantly more sound than analogies (RI + 
R2)(M = 4.16). Without embracing the null hypothesis, we can safely say 
that these studies produced no evidence that adding object commonalities 
increases soundness. Overall, the results indicate that subjective sound- 
ness reflects the degree of common relational structure. 

These results are compatible with prior findings concerning aptness of 
metaphor. Subjects’ ratings of the aptness of metaphors were positively 
correlated with the degree of relationality of their interpretations (as 
judged by independent judges) and either uncorrelated or negatively cor- 
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Psa with the attributionality of their interpretations (Gentner, 1988b: 
hae & Clement, 1988). Finally, the correlation between our subjects’ 
soundness ratings and SME’s structural evaluation scores is consistent 


with the claim that the human i i i 
perception of inferential soundness j 
on the degree of structural match. is 


Similarity-Based Access 


The results for access were almost the reverse of t 
ness. Subjects tended not to retrieve the matches Hod ter a ea 
sound. Instead, they were most likely to access surface matches. As 
above, there are three lines of evidence. First, adding common object 
attributes to a match consistently increased the proportion relieved, In 
Experiments 1 and 2, recall of surface matches (OA + R1) was greater 
than recall of FOR matches (R1) by at least a factor of three. In Ex T- 
iment 2, recall of literal similarity matches (OA + R1 + R2) was ae 
than recall of analogical matches (RI + R2) by about a factor of five 
Second, adding common object attributes contributes more to retrievabil- 
ity than adding common higher-order relations (in contrast to the sound- 
ness results). In all three studies, recall of surface matches (OA + RI) 
exceeded recall of analogical matches (Ri + R2). The proportions of 
oe ses across the three studies were .78, .53, and .52 
ctively, substanti i i inalogies (.44, 
a ei Gr emcee higher than the retrieval rate for analogies (.44, 
Finally, we can ask whether this surface advan i 
adding higher-order relational commonalities eae aii 2 eS 
cessibility? The evidence here is mixed. In Experiment 2 analogy (RI + 
R2) (Mu at 12) was no better recalled than first-order matches (R1) (M = 
10) and literal similarity (OA + RI + R2) (M = .56) no better than SS 
(OA + R1)(M = -53). However, in Experiment 1, analogies (RI + R2) 
(M = .44) were retrieved more often than first-order matches (RI) (M = 
25). Taken in combination with other studies, these findings suggest that 
higher-order relational similarities can sometimes promote access, but 
that the effects are less robust than the effects for surface matches. We 
will return below to the issue of when relational access is most likely. 


Similarity 
Accessibility and subjective soundness are both aspects of similarity in 
transfer. Their divergent patterns leave us in something of a fide : 
which is the real similarity? We therefore asked subjects to rate similarity 
directly. If subjective similarity had accounted for accessibility, then it 
might have been possible to argue a unary notion of similarity in that 


accessibility is governed by similarity and to deal with the divergent 
soundness patterns in some other way. However, subjective similarity 
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resembled soundness in its sensitivity to common relational structure. 
Adding relational information increased subjective similarity in every 
case where the comparison can be made. Literal similarity matches (OA 
+ R1 + R2) (M = 4.5) were considered more similar than surface 
matches (OA + R1)(M = 3.4) (Experiment 2); analogical matches (Ri + 
R2) (M = 4.09) were considered more similar than FOR-matches (R1) (M 
= 2.88) (Experiment 2); and surface matches (OA + R1)(M = 3.21) were 
considered more similar than object-only matches (OA) (M = 2.3) (Ex- 
periment 3). The second line of evidence is the comparative addition 
argument. If, starting with a first-order match, we compare the effects of 
adding higher-order commonalities versus adding object commonalities, 
we find that analogies (R1 + R2) are rated as more similar than surface 
matches (OA + R1) in both Experiment 2 and Experiment 3 (M = 4.01 
and M =-3.21, respectively). But unlike soundness, similarity is also 
increased by adding object commonalities. Literal similarity matches (OA 
+ Ri + R2) are considered more similar than analogy matches (RI + 
R2); and surface matches (OA + R1) are considered more similar than 
FOR-matches (R1) (Experiment 2). It appears that both structural and 
surface similarities contribute to subjective similarity. 


Similarity and Structural Alignment 


The finding that common relational structure contributes to similarity is 
consistent with other recent findings. For example, Goldstone, Medin and 
Gentner (1991) and Medin, Goldstone, and Gentner (1990) found that 
when subjects were asked to choose between a relational match and an 
attributional match to a standard, they generally preferred the relational 
match: e.g., XX would be considered more similar to TT than to XT. 
Further, this relational bias is affected by whether relational or attribu- 
tional commonalities already predominate (Goldstone, et al., 1991). An- 
other source of evidence for the role of relational structure in similarity is 
configural effects in perceptual similarity. Pomerantz, Sager, and Stoever 
(1977) found that subjects could more easily discriminate between the 
stimuli ) and ( when they were presented in the context of a third identical 
element, ), yielding the discrimination )) vs. (). Pomerantz et al. sug- 
gested that the effect of adding this contextual component was to promote 
emergent features, such as symmetry and intersection. The added com- 
ponent introduced different relational structures in the two stimuli, mak- 
ing them more dissimilar and hence more discriminable. (See also Lock- 
head & King, 1977; Palmer, 1977, 1978, 1989). 

The relationship between similarity and structural alignment is under- 
scored by a finding of Markman and Gentner’s (1990, in press). Subjects 
were asked to map between two pictures that were constructed to contain 
cross-mappings: e.g., XYZYX, voxov. Subjects who judged the simi- 
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larity of the two scenes before doing the mapping task were more likely to 
base their mapping on an alignment of relational structure rather than 
using local object matches: e.g., they would map X onto v in the above 
pair, rather than X onto x. These findings suggest that aligning relational 
structure is integral to the perception of similarity and support the claim 
that the computation of similarity utilizes common structural connectivity 
among features, rather than simply counting numbers of matching fea- 
tures (Falkenhainer, Forbus & Gentner, 1989; Goldstone & Medin, in 
press; Holyoak & Thagard, 1989; Hofstadter & Mitchell, in press). 


The Plurality of Similarity 


The dissociation between surface similarity and structural similarity 
across different processes is related to several recent discussions. Medin, 
Goldstone, and Gentner (in press) and Gentner (1989) have argued that 
similarity is pluralistic, in the sense that there are multiple subclasses of 
similarity and multiple influences on how it is computed. Rips (1989) 
demonstrated a dissociation between similarity, typicality, and categori- 
zation. Murphy and Medin (1985) and Keil (1989) have commented on the 
limited usefulness of simple similarity and pointed out that physical re- 
semblance does not provide a sufficient basis for conceptual structure. 
Barsalou’s 1982 ad hoc categories, such as ‘‘things to take on a picnic”’ 
and Glucksberg and Keysar’s (1991) metaphorically based categories, 
such as ‘“‘jail’’ as a prototypical confining institution, are examples of 
abstract or relational commonalities. The present results argue specifi- 
cally against the notion of a unitary similarity that governs retrieval, 
evaluation and inference. 


. Similarity-Based Retrieval 
MAC/IFAC 


The fact that access to memory is surface-driven cannot be taken to 
mean that the memories themselves contain only surface features. Gick 
and Holyoak’s (1980, 1983) studies and our Experiment 4 demonstrate 
that even when subjects are not able to use structural matches to access 
memory, they nonetheless may possess that information. The MAC/FAC 
simulation is aimed at capturing both these facts: that humans success- 
fully store and retrieve intricate relational structures—and that access to 
these stored structures is heavily—although not entirely—surface-driven. 
MACI/FAC is a two-stage retrieval model whose first Stage is attentive to 
content and blind to structure and whose second stage is structurally 
sensitive. MAC/FAC’s first stage simply computes the dot product of the 
probe’s content vector—a simple list of all the predicates contained in the 
structural description—with that of each item in the memory pool. Its 
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second stage computes a structural similarity alignment and evaluation of 
the match between the probe and each of the top items from the first 
stage. There are several appealing features of this model. First, since the 
initial surface match does not require computing structural consistency, it 
is computationally cheap; yet, if surface and structural information is 
correlated (as in ordinary experience), there will be many literal similarity 
matches. This economy means that the first stage sometimes fails to 
produce a legitimate structural match, capturing the cases when our sub- 
jects considered their own retrievals to be structurally unsound. Second, 
unlike most case-based retrieval systems, the representations do not need 
to be indexed by an intelligent agent. The content vector acts as a kind of 
automatic index—a dumb index, to be sure, since it simply lists all the 
predicates in the representation, but one that may fit human performance. 
Third, the system fits the data so far fairly well. It displays the appropriate 
primacy of mundane and surface matches, and it occasionally retrieves 
purely relational matches. 

Comparison with current approaches. MAC/FAC’s nearest neighbor 
among models of memory retrieval is ARCS (Thagard, Holyoak, Nelson, 
& Gochfeld, 1990). Like MAC/FAC, ARCS assumes that reminding is 
based on a combination of similarity of concepts and structural isomor- 
phism, although ARCS also assigns a key role in reminding to pragmatic 
centrality. ARCS sets up a competitive network between the probe and all 
semantically related items, using excitatory and inhibitory connections to 
model the above pressures. The item that the network settles on is con- 
sidered fo be the item retrieved. ARCS and MAC/FAC have many sim- 
ilarities, but they differ in a few important ways. MAC/FAC makes a 
distinction between dumb initial processes in memory retrieval and smart 
later processes where ARCS does not. MAC/FAC often produces more 
than one retrieval; ARCS produces only one. ARCS utilizes competition 
among memory items in a manner integral to the computation to a greater 
extent than does MAC/FAC. On the other hand ARCS has the disadvan- 
tage that a potential for a combinatorial explosion is inherent in setting up 
full networks between the probe and each of a large pool of memory 
items. Although MAC/FAC accounts for the present results better than 
ARCS, more research will be required to sort out their respective advan- 
tages. 

MAC/FAC can also be compared with two important classes of mem- 
ory models: case-based reasoning models utilizing abstract structural in- 
dices and mathematical memory models utilizing feature vectors. These 
two approaches have different advantages and disadvantages. Case-based 
reasoning approaches are good at analogical mapping and inference but 
fail to show the surface superiority effect in access. Feature-vector mod- 
els are tractable for large data bases and can capture the lack of structural 
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Expertise and Relational Access 


De . . » * . 
a ase A eid picture painted in the present research and in most 
M-solving research, there is evidence of considerable rela- 


quicker to reject it than were novices. Similarly, J. Clement (1982 1986) 


The increase of relati indi i 
onal reminding with expertise ith i i 
and with 
ee can be accommodated in the MAC/FAC model pear ear ele 
perts’ representations are richer and better structured than those SE 
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novices (Carey, 1985; Chi, 1978), then their encodings will contain more 
higher-order relations and this could promote relational retrieval. For 
example, there is evidence that teaching children to notice and encode 
higher-order dimensional relations such as symmetry and monotonicity 
increases their ability to appreciate abstract cross-dimensional matches 
(Kotovsky & Gentner, 1990; in preparation). Equally important for access 
is the uniformity of the internal relational encoding (Gentner & Ratter- 
mann, 1991). Whereas novices’ knowledge may be only locally coherent, 
we conjecture that experts may use the same set of theoretical notions 
across the domain and that this promotes uniform relational encodings in 
the domain. When a given higher-order relational pattern is used to en- 
code situations, it will of course be automatically incorporated into MAC/ 
FAC’s content vectors. This means that any higher-order relational con- 
cept that is widely used in a domain will tend to increase the uniformity 
of the vectors’ entries (the ‘indices’ in CBR terms) and therefore the 
mutual accessibility of situations within the domain. Thus as experts 
come to encode a domain according to a uniform set of principles, the 
likelihood of appropriate relational remindings increases. A corollary of 
this view is that the use of uniform relational labels may be an important 
contributor to achieving analogical access (Catrambone, in preparation; 
Clement, Mawby, & Giles, 1991). 

As domain expertise increases, MAC/FAC’s behavior may come to 
resemble that of a case-based reasoning model with some surface index- 
ing. We can think of its content vectors as indices with the property that 
they change automatically with any change in the representation of do- 
main exemplars. Thus as domain knowledge increases, MAC/FAC may 
have sufficiently elaborated indices to permit a fairly high proportion of 
relational remindings. The case-based reasoning emphasis on retrieving 
prior examples and generalizations that are inferentially useful may be a 
reasonable approximation to the way experts retrieve knowledge. 


Why Surface Access? 

These findings may leave us feeling schizophrenic. How can the human 
mind, at times so elegant and rigorous, be limited to this primitive re- 
trieval mechanism? An intriguing possibility is that in the evolution of 
cognition, retrieval from memory is an older process than inferential rea- 
soning over symbolic structures. We could thus think of our surface bias 
in retrieval as a vestige of our evolutionary past, perhaps even a mistake 
in design that we have never lived down. But regardless of whether the 
evolutionary account is correct, there are reasons that a surface bias 
might be a good design choice for humans. First, there is the kind world 
hypothesis: that in much of our experience surface information is strongly 
correlated with structural information. If something looks like a tiger, it 
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probably is one. Perceptual information being plentiful and easy for us to 
process, it might be a good Strategy for a being with a large store of 
knowledge to take advantage of this correlation (Gentner, 1989; Medin & 
Ortony, 1989). The second point is the argument from scale. Human 
knowledge bases are vastly larger than any artificial base yet constructed. 
It is not clear that a structural indexing scheme would scale up properly 
to human memory size; we might find ourselves swamped with relational 
remindings. 
A third argument that surface access might sometimes be a good idea 
rests on considerations of learning, reminding-based generalization and 
the novice-expert shift. Let’s agree that a difference between novices and 
experts in a domain is that experts know the domain theory: in particular, 
they know the relational constructs for that domain (Chi, 1978; Chi, Fel- 
tovich, & Glaser, 1981; Gentner & Rattermann, 1991). Let us further 
conjecture that the optimal relational constructs are typically domain- 
specific and may differ considerably between domains. In such a case, for 
a learner to move quickly to an abstract structural description could lead 
to nonrecoverable errors. It would be adaptive for notices to utilize highly 
conservative encodings with rich information about objects and contexts, 
and to add relational information gradually. If we further assume that 
reminding-based generalization can lead gradually to relational abstrac- 
tions (Elio & Anderson, 1981; Forbus & Gentner, 1986: Gick & Holyoak, 
1983; Hayes-Roth & McDermott, 1978; Medin & Ross, 1989; Ross, 1989, 
in press; Skorstad, Gentner & Medin, 1988) then an initial conservative 
bias would best allow the relational vocabulary to develop according to 
the demands of the domain. This brings us to our last conjecture as to why 
an object-oriented retrieval Strategy might be useful: that of incommen- 
surability of representations. Accepting Carey’s (1991) arguments that 
fundamental changes in beliefs can occur with learning and development, 
then it would be adaptive for human retrieval mechanisms to maintain 
some reliance on those aspects of internal representations that change 
least. Assuming that object concepts are relatively stable across theory 
change, then maintaining some reliance on objects in retrieval preserves 
access to early learning. 


APPENDIX 
Sample Stimuli from Experiments 1, 2, and 3 


Base Story 


Percy the mockingbird spent the whole warm season chirping and twit- 
tering. When it began to get colder Percy visited a squirrel and sang a song 
for her, expecting to get some of the squirrel’s sunflower seeds in return. 
However, the squirrel was very disappointed in him. 
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‘“*You are a terrible singer!’’ she yelled. ‘‘I’m not giving you any of my 


wheat.”” oh. 
A tear rolled down Percy’s cheek, and he vowed to give up singing for 


good. 


Literal-Similarity Match , é 
i inter came he paid a 

A magpie named Sam sang all summer. When win 
visit to ipa and performed a ballad for her, hoping she would give 
him some nuts in return. However, the chipmunk was not at all pleased. 
‘*You don’t deserve any nuts of mine!’ she exclaimed. ‘Your song was 


terrible.”’ 


Analogy Match 
Sam travelled all over the world buying beautiful things. When he ran 
out of money he paid a visit to his mother and gave her a gift he ace 
while in Tibet, hoping she would give him a loan in return. However, his 
mother was not at all pleased. sean 
**You don’t deserve any money of mine!’’ she exclaimed. ‘‘This is a 


piece of junk!’ 


Surface-Similarity Match ere 
i When winter came he paid a 

A magpie named Sam sang all summer. i 
visit to a chipmunk. However, the chipmunk was not at all pleased with 


Sam. . 

‘*You Have wasted the summer while I have been hard at work!”’ she 
said. Sam performed a ballad for her hoping she would give him some oe 
in return. But she was still not pleased. ‘‘I will not give you any of my 


nuts!’’ she exclaimed. 


FOR Match 
Sam travelled all over the world buying beautiful things. When re 
out of money he paid a visit to his mother. However, she was not at al 
Oe aide thao eek hard at work you have been wasting your time,” 
she said. Sam gave her a gift he bought in Tibet, hoping she would give 
him a loan in return. But she was still not pleased. ‘‘I will not give you any 
of my hard-earned money!’’ she exclaimed. 


Objects-Only Match ha 
One unusually warm spell in February Sam the magpie thought oh 
is my chance.’’ He stood up on the edge of his nest and trilled a - 
His song was so loud and cheerful that it woke a nearby ee : ree 
chipmunk asked for another song. He was so moved by Sam’s coy — 
he forgot it was still winter and decided to go looking for nuts to store. 
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